168 research outputs found
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
Non-autoregressive translation (NAT) models, which remove the dependence on
previous target tokens from the inputs of the decoder, achieve significantly
inference speedup but at the cost of inferior accuracy compared to
autoregressive translation (AT) models. Previous work shows that the quality of
the inputs of the decoder is important and largely impacts the model accuracy.
In this paper, we propose two methods to enhance the decoder inputs so as to
improve NAT models. The first one directly leverages a phrase table generated
by conventional SMT approaches to translate source tokens to target tokens,
which are then fed into the decoder as inputs. The second one transforms
source-side word embeddings to target-side word embeddings through
sentence-level alignment and word-level adversary learning, and then feeds the
transformed word embeddings into the decoder as inputs. Experimental results
show our method largely outperforms the NAT baseline~\citep{gu2017non} by
BLEU scores on WMT14 English-German task and BLEU scores on WMT16
English-Romanian task.Comment: AAAI 201
Ground state properties of a multi-component bosonic mixture: a Gutzwiller mean-field study
Using the single-site Gutzwiller method, we theoretically study the ground
state and the interspecies entanglement properties of interexchange symmetric
multi-component (two- and three-) bosonic mixtures in an optical lattice, and
the results are generalized to an -component () system. We
compute the mean-field phase diagram, the interspecies entanglement entropy,
and the ground state spectral decomposition. Three phases namely the
-component Superfluid state (nSF), the -component Mott insulator state
(nMI), and the Super-counter-fluid state (SCF) are observed. Interestingly, we
find that there are SCF lobes to separate every two neighboring nMI lobes
in the phase diagram. More importantly, we derive the exact general expression
of the interspecies entanglement entropy for the SCF phase. In addition, we
also investigate the demixing effect of an n-component mixture and demonstrate
that the mixing-demixing critical point is independent of n.Comment: 12 pages, 6 figure
STL-SGD: Speeding Up Local SGD with Stagewise Communication Period
Distributed parallel stochastic gradient descent algorithms are workhorses
for large scale machine learning tasks. Among them, local stochastic gradient
descent (Local SGD) has attracted significant attention due to its low
communication complexity. Previous studies prove that the communication
complexity of Local SGD with a fixed or an adaptive communication period is in
the order of and when the data distributions on clients are identical (IID) or
otherwise (Non-IID), where is the number of clients and is the number
of iterations. In this paper, to accelerate the convergence by reducing the
communication complexity, we propose \textit{ST}agewise \textit{L}ocal
\textit{SGD} (STL-SGD), which increases the communication period gradually
along with decreasing learning rate. We prove that STL-SGD can keep the same
convergence rate and linear speedup as mini-batch SGD. In addition, as the
benefit of increasing the communication period, when the objective is strongly
convex or satisfies the Polyak-\L ojasiewicz condition, the communication
complexity of STL-SGD is and for the IID case and the Non-IID case respectively, achieving
significant improvements over Local SGD. Experiments on both convex and
non-convex problems demonstrate the superior performance of STL-SGD.Comment: Accepted by AAAI202
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
While Diffusion Generative Models have achieved great success on image
generation tasks, how to efficiently and effectively incorporate them into
speech generation especially translation tasks remains a non-trivial problem.
Specifically, due to the low information density of speech data, the
transformed discrete speech unit sequence is much longer than the corresponding
text transcription, posing significant challenges to existing auto-regressive
models. Furthermore, it is not optimal to brutally apply discrete diffusion on
the speech unit sequence while disregarding the continuous space structure,
which will degrade the generation performance significantly. In this paper, we
propose a novel diffusion model by applying the diffusion forward process in
the \textit{continuous} speech representation space, while employing the
diffusion backward process in the \textit{discrete} speech unit space. In this
way, we preserve the semantic structure of the continuous speech representation
space in the diffusion process and integrate the continuous and discrete
diffusion models. We conduct extensive experiments on the textless direct
speech-to-speech translation task, where the proposed method achieves
comparable results to the computationally intensive auto-regressive baselines
(500 steps on average) with significantly fewer decoding steps (50 steps).Comment: Accepted in EMNLP2023 main conferenc
- …